Design of a 2D DCT/IDCT application specific VLIW processor supporting scaled and sub-sampled blocks
نویسندگان
چکیده
We present an innovative design of an accurate, 2D DCT IDCT processor, which handles scaled and sub-sampled input blocks efficiently. In the IDCT mode, the latency of the processor scales with the size of the input blocks varying from 7 cycles for an 1x1 block to 38 cycles for an 8x8 block. This scalability is possible because the processor has input data dependant control by which it can exploit the reduced computational needs of sub-sampled blocks and blocks of smaller sizes to work in lesser cycles. This is a very useful feature for MPEG and HDTV decoders and has hitherto not been exploited. Clocking at 150 Mhz, the processor satisfies the high sample rate requirement of dual MPEG stream HD decoding with a picture size of 1920 x 1080 at 30 frames per second. Fixed word length and accuracy simulations of our design shows that it conforms to the accuracy specifications of the CCITT standard within a 16 bit data path. A methodology based on architecture level synthesis is used to design the VLIW processor core. The VLIW design exploits the Instruction Level Parallelism present in the DCT/IDCT application, efficiently. The processor core is characterised by an area of 0.834 mm sq. and a frequency of 150 Mhz in 0.18 micron CMOS technology.
منابع مشابه
A simple processor core design for DCT/IDCT
This paper presents a cost-effective processor core design that features the simplest hardware and is suitable for discrete cosine transform/indiscrete cosine transform (DCT/IDCT) operations in H.263 and digital camera. This design combines the techniques of fast direct two-dimensional DCT algorithm, the bit-level adder-based distributed arithmetic, and common subexpression sharing to reduce th...
متن کاملVLSI IMPLEMENTATION OF ARITHMETIC COSINE TRANSFORM IN FPGA TECHNOLOGY B.Bhavani
In Image processing the Image compression can improve the performance of the digital systems by reducing the cost and time in image storage and transmission without significant reduction of the Image quality. This paper describes hardware architecture of low complexity Discrete Cosine Transform (DCT) architecture for image compression[6]. In this DCT architecture, common computations are identi...
متن کاملUnified Architecture for 8×8 DCT / IDCT with Register-Based Matrix Transposition
In this paper, a unified processor architecture supporting 8× 8 DCT and IDCT is proposed. The architecture is based on row-column decomposition of two-dimensional transform. The resulting modular architecture can be efficiently pipelined. Moreover, the architecture, including the register-based matrix transposition network, is operating over sequential data stream.
متن کاملA Low-Power and Low-Complexity DCT/IDCT VLSI Architecture Based On Backward Chebyshev Recursion
A low-power parallel VLSI structure for DCT/IDCT is proposed. By treating the transformations as the evaluation of the Chebyshev series, and exploiting the Backward Chebyshew Recursion (BCR), we can reduce the total number of multipliers (N + 1 for IDCT, 2N 2 for DCT). The property of BCR is also used to compute the DCT/IDCT through the down-sampled even and odd sequences. Since the operation f...
متن کاملContext-Aware Fast 3D DCT/IDCT Algorithm for Low- power Video Codec in Mobile Embedded Systems
A context-aware fast algorithm for 3D DCT/ IDCT video coding is presented in the paper. Compared to state-of-art MPEGx/H.26x hybrid schemes the new algorithm is suited for low-power video codec in mobile embedded systems. 1. 3D DCT for Low-power Video Codecs The 2D-DCT is a powerful tool for reducing the spatial information redundancy in 2D images: intraframe coding of still images and videos i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003